Skip to content

Conversation

@SamMorrowDrums
Copy link
Collaborator

Summary

Adds an optional symbol parameter to the get_file_contents tool that uses tree-sitter to extract a specific named symbol from a file. Instead of returning the entire file contents, only the matching symbol's source code is returned.

Example usage

{
  "owner": "github",
  "repo": "github-mcp-server",
  "path": "pkg/github/repositories.go",
  "symbol": "GetFileContents"
}

Returns just the GetFileContents function definition instead of the entire 900+ line file.

How it works

  1. File is fetched normally via the GitHub Contents API
  2. If symbol parameter is provided and the file is a supported language, tree-sitter parses the source
  3. Searches top-level declarations first, then nested declarations (methods inside classes, etc.)
  4. Returns the symbol's source code with its kind (e.g. function_declaration, method_definition)
  5. If the symbol is not found, returns an error listing all available symbols

Supported languages

Go, Python, JavaScript, TypeScript, Ruby, Rust, Java, C/C++ — reuses the tree-sitter configs from the structural diff engine.

Pairs with structural diffs

This creates a powerful workflow:

  1. compare_file_contents shows which symbols changed (structural diff)
  2. get_file_contents with symbol fetches just the specific symbol to examine

Testing

  • 12 unit tests covering all major languages
  • Tests for top-level symbols, nested symbols (class methods), error cases
  • Existing get_file_contents tests still pass (parameter is optional)
  • Toolsnap updated

Dependencies

Stacked on #1982 (tree-sitter structural diff) → #1981 (semantic data diffs)


Part of #1973

Adds an optional 'symbol' parameter to get_file_contents that uses
tree-sitter to extract a specific named symbol (function, class, type,
method, etc.) from a file. Instead of returning the entire file, only
the matching symbol's source code is returned.

Supports all languages from the structural diff engine: Go, Python,
JavaScript, TypeScript, Ruby, Rust, Java, C/C++. For unsupported
file types, returns an error suggesting the feature is not available.

If the symbol is not found, the error message includes a list of
available symbols in the file to help the model self-correct.

This pairs well with the structural diff tool — a model can see which
symbols changed via compare_file_contents, then fetch specific symbols
via get_file_contents to examine them in detail.
@SamMorrowDrums SamMorrowDrums requested a review from a team as a code owner February 9, 2026 22:54
Copilot AI review requested due to automatic review settings February 9, 2026 22:54
…n docs

Documents the tree-sitter structural diff engine, compare_file_contents
tool, symbol extraction via get_file_contents, CGO requirement, and
how to add new language support. Also updates build commands to include
CGO_ENABLED=1.
Symbol text is just code — a ResourceContents wrapper with URI/MIME
type adds no value. Use NewToolResultText for a simpler, more natural
response.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the get_file_contents MCP tool to optionally return only the source code for a named symbol (function/class/type/etc.) by reusing the repo’s existing tree-sitter language configs.

Changes:

  • Adds a new optional symbol input parameter to get_file_contents and performs symbol extraction for text/code files.
  • Introduces ExtractSymbol helper implementation for symbol lookup using tree-sitter declarations.
  • Adds unit tests for symbol extraction across several languages and updates the tool schema snapshot.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

File Description
pkg/github/repositories.go Adds symbol parameter parsing and returns extracted symbol source instead of full file content when requested.
pkg/github/symbol_extraction.go New symbol extraction helper built on top of the existing tree-sitter declaration extraction utilities.
pkg/github/symbol_extraction_test.go New unit tests covering symbol extraction behavior across multiple languages and error cases.
pkg/github/__toolsnaps__/get_file_contents.snap Updates tool schema snapshot to include the new symbol parameter.
Comments suppressed due to low confidence (1)

pkg/github/repositories.go:658

  • Repository docs list tool parameters in README, and get_file_contents currently documents params only up through sha. Since this PR adds a new symbol input, regenerate and commit the README/docs output so the published tool docs match the updated schema.
					"symbol": {
						Type:        "string",
						Description: "Optional: extract a specific symbol (function, class, type, etc.) from the file. For supported languages, returns only the symbol's source code instead of the entire file. If the symbol is not found, returns a list of available symbols.",
					},

Comment on lines +655 to 660
"symbol": {
Type: "string",
Description: "Optional: extract a specific symbol (function, class, type, etc.) from the file. For supported languages, returns only the symbol's source code instead of the entire file. If the symbol is not found, returns a list of available symbols.",
},
},
Required: []string{"owner", "repo"},
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the symbol property changes the tool input schema, but Test_GetFileContents in repositories_test.go currently asserts the old schema keys (it checks for sha but not symbol). That test will fail once this PR is merged; update the schema assertions and consider adding a tool-level test case that passes symbol and verifies the extracted symbol is returned.

This issue also appears on line 655 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +11
// ExtractSymbol searches source code for a named symbol and returns its text.
// It searches top-level declarations first, then recursively searches nested
// declarations (e.g. methods inside classes). Returns the symbol text and its
// kind, or an error if the symbol is not found or the language is unsupported.
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says nested declarations are searched "recursively", but this implementation only checks top-level declarations and then the declarations nested within each top-level declaration. Either adjust the comment to match the actual depth supported, or extend the search to recurse into nested declarations-of-declarations.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +38
// Search nested declarations (methods inside classes, etc.)
for _, decl := range decls {
nested := extractChildDeclarationsFromText(config, decl.Text)
if text, kind, found := findSymbol(nested, symbolName); found {
return text, kind, nil
}
}

// Build list of available symbols for the error message
available := listSymbolNames(config, decls)
return "", "", fmt.Errorf("symbol %q not found. Available symbols: %s", symbolName, strings.Join(available, ", "))
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the symbol is not found, this code re-parses declaration bodies multiple times (once during the nested search loop, and again while building the available-symbol list). Consider collecting nested declarations/names during the first pass (or caching per top-level decl) to avoid redundant tree-sitter parses, especially on large files.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +38
// Build list of available symbols for the error message
available := listSymbolNames(config, decls)
return "", "", fmt.Errorf("symbol %q not found. Available symbols: %s", symbolName, strings.Join(available, ", "))
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Available symbols" error string is unbounded and can become very large for files with many declarations, which can bloat tool responses and logs. Consider sorting + de-duping, and/or truncating the list (e.g., first N symbols plus a count of remaining) to keep the error response size predictable.

Copilot uses AI. Check for mistakes.
Adds InstructionsFunc to the repos toolset describing how to combine
compare_file_contents (structural diff) with get_file_contents symbol
extraction for efficient code review. Server instructions focus on
multi-tool flows only — single-tool features are already documented
in each tool's own description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant